Skip to content

feat: support Voyage embeddings via MongoDB Atlas (mongodb/ prefix)#206

Open
antronic wants to merge 1 commit into
cocoindex-io:mainfrom
antronic:feat/mongodb-atlas-voyage-embeddings
Open

feat: support Voyage embeddings via MongoDB Atlas (mongodb/ prefix)#206
antronic wants to merge 1 commit into
cocoindex-io:mainfrom
antronic:feat/mongodb-atlas-voyage-embeddings

Conversation

@antronic

Copy link
Copy Markdown

Description

Adds support for accessing Voyage AI embedding models through MongoDB Atlas.

When the configured embedding model name starts with the mongodb/ prefix (e.g.
mongodb/voyage-4-large), the embedder now:

  • Rewrites the model to LiteLLM's native Voyage provider (mongodb/voyage-4-largevoyage/voyage-4-large).
  • Sets api_base to https://ai.mongodb.com/v1, which is the Atlas-hosted Voyage embedding endpoint.

MongoDB Atlas serves Voyage models over the native Voyage API (same request/response
shape, authenticated with a VOYAGE_API_KEY bearer token), so routing through the
voyage/ provider with an overridden api_base is sufficient — no new provider or
response transformation is required. This was verified against LiteLLM's
VoyageEmbeddingConfig.get_complete_url, which honors a caller-supplied api_base.

Changes

  • src/cocoindex_code/litellm_embedder.py: add _resolve_mongodb_model() helper; map
    mongodb/*voyage/* in PacedLiteLLMEmbedder.__init__ and inject the Atlas
    api_base in run_embedding_request. Because the resolved model uses the voyage/
    prefix, the existing native-provider branch already skips forcing encoding_format /
    drop_params.
  • src/cocoindex_code/embedder_defaults.py: extend the curated Voyage defaults regex to
    (voyage|mongodb)/.+ so ccc init applies the same input_type (document/query)
    defaults to Atlas-hosted Voyage models.
  • EMBEDDINGS.md: document the mongodb/ configuration option.
  • Tests: add coverage in tests/test_litellm_embedder.py (routing + Atlas api_base)
    and tests/test_embedder_defaults.py (curated defaults match).

Example configuration

embedding:
  provider: litellm
  model: mongodb/voyage-4-large
envs:
  VOYAGE_API_KEY: your-atlas-model-api-key

Motivation and context

MongoDB Atlas now offers the Voyage AI Embedding and Reranking API at
https://ai.mongodb.com/v1. Users with an Atlas model API key could not point ccc at
this endpoint, since LiteLLM has no mongodb provider and the model string was passed
through unchanged. This change lets those users select Atlas-hosted Voyage models with a
simple mongodb/ prefix while reusing the existing Voyage code path and curated defaults.

Reference: https://www.mongodb.com/docs/voyageai/api-and-clients/

Breaking change

No. This is purely additive — existing voyage/, openai/, and other model strings are
unaffected.

Related issues

None.

Testing

  • uv run mypy . — no new issues on changed files.
  • uv run pytest tests/test_litellm_embedder.py tests/test_embedder_defaults.py tests/test_shared.py — pass.
  • prek run --all-files — the only failures were two pre-existing flaky daemon-lifecycle
    e2e tests (test_session_db_path_mapping, test_index_and_search_via_client), which
    pass when run in isolation and are unrelated to this change.

Route 'mongodb/<voyage-model>' model names through LiteLLM's voyage provider with api_base set to https://ai.mongodb.com/v1. MongoDB Atlas serves Voyage models over the native Voyage API using a VOYAGE_API_KEY bearer token, so the same input_type curated defaults apply.
@badmonster0 badmonster0 requested a review from georgeh0 June 30, 2026 19:43
@badmonster0

Copy link
Copy Markdown
Member

thanks @antronic , @georgeh0 can help take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants